Analyzing the Cost of a Cache Miss Using Pipeline Spectroscopy
نویسندگان
چکیده
We describe a new technique called Pipeline Spectroscopy that allows us to precisely measure the cost of each cache miss. The cost of a miss is displayed (graphed) as a histogram, which represents a precise readout showing a detailed visualization of the cost of each cache miss throughout all levels of the memory hierarchy. We call the graphs ‘spectrograms’ because they reveal certain signature characteristics of the processor’s memory hierarchy, the pipeline, and the miss pattern itself. We show that in a memory hierarchy with N cache levels (L1, L2, ..., LN, and memory) and a miss cluster of size C, there are C + N C possible miss penalties. This represent all possible sums from all possible combinations of the miss latencies with and without overlap from each level of the memory hierarchy (L2, L3, ... Memory) for a given cluster size. Additionally, a theory is presented that describes the shape of a spectrogram, and we use this theory to predict the shape of spectrograms for larger miss clusters. Next we provide to examples using spectroscopy to optimize the processor’s hardware or application’s software. The first example uses a miss spectrogram to improve the software design of an application. The second example uses a miss spectrogram to analyze bus queuing. Our experiments show that performance gains of up to 8% are possible. Detailed analysis of a spectrograph leads to much greater insight in pipeline dynamics, including effects due to prefetching, and miss queuing delays.
منابع مشابه
Measuring The Cost Of A Cache Miss
It is vital that the cost of a cache miss be accurately measured in order for many hardware and software optimizations to occur. In this paper we describe a new technique, called pipeline spectroscopy, that allows pipeline delays to be monitored and analyzed in detail. We apply this technique to produce a cache miss ‘spectrogram’, which represents a precise readout showing a detailed histogram ...
متن کاملDSTRIDE: Data-Cache Miss-Address-Based Stride Prefetching Scheme for Multimedia Processors
Prefetching reduces cache miss latency by moving data up in memory hierarchy before they are actually needed. Recent hardware-based stride prefetching techniques mostly rely on the processor pipeline information (e.g. program counter and branch prediction table) for prediction. Continuing developments in processor microarchitecture drastically change core pipeline design and require that existi...
متن کاملPredictive Sequential Associative Cache
Traditionally, set-associative caches are implemented by comparing all blocks in a cache set in parallel for each reference and then selecting the desired block from the set. By providing more than one location for holding the data for a particular memory address, set associativity reduces the cache miss rate for most programs. The traditional solution is, however, not without cost. As contrast...
متن کاملA multithreaded
This paper describes the microarchitecture of the RS64 IV, a multithreaded PowerPC processor, and its memory system. Because this processor is used only in IBM iSeries and pSeries commercial servers, it is optimized solely for commercial server workloads. Increasing miss rates because of trends in commercial server applications and increasing latency of cache misses because of rapidly increasin...
متن کاملArchitectural and implementation tradeoffs in the design of multiple-context processors
Multiple-context processors have been proposed as an architectural technique to mitigate the effects of large memory latency in multiprocessors. In this paper, we examine two schemes for implementing multiple-context processors. The first scheme switches between contexts only on a cache miss, while the other interleaves the contexts on a cycle-by-cycle basis. Both schemes provide the capability...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Instruction-Level Parallelism
دوره 10 شماره
صفحات -
تاریخ انتشار 2008